93 research outputs found
Deepfake Detection of Occluded Images Using a Patch-based Approach
DeepFake involves the use of deep learning and artificial intelligence
techniques to produce or change video and image contents typically generated by
GANs. Moreover, it can be misused and leads to fictitious news, ethical and
financial crimes, and also affects the performance of facial recognition
systems. Thus, detection of real or fake images is significant specially to
authenticate originality of people's images or videos. One of the most
important challenges in this topic is obstruction that decreases the system
precision. In this study, we present a deep learning approach using the entire
face and face patches to distinguish real/fake images in the presence of
obstruction with a three-path decision: first entire-face reasoning, second a
decision based on the concatenation of feature vectors of face patches, and
third a majority vote decision based on these features. To test our approach,
new datasets including real and fake images are created. For producing fake
images, StyleGAN and StyleGAN2 are trained by FFHQ images and also StarGAN and
PGGAN are trained by CelebA images. The CelebA and FFHQ datasets are used as
real images. The proposed approach reaches higher results in early epochs than
other methods and increases the SoTA results by 0.4\%-7.9\% in the different
built data-sets. Also, we have shown in experimental results that weighing the
patches may improve accuracy
A Novel Scheme for Intelligent Recognition of Pornographic Images
Harmful contents are rising in internet day by day and this motivates the
essence of more research in fast and reliable obscene and immoral material
filtering. Pornographic image recognition is an important component in each
filtering system. In this paper, a new approach for detecting pornographic
images is introduced. In this approach, two new features are suggested. These
two features in combination with other simple traditional features provide
decent difference between porn and non-porn images. In addition, we applied
fuzzy integral based information fusion to combine MLP (Multi-Layer Perceptron)
and NF (Neuro-Fuzzy) outputs. To test the proposed method, performance of
system was evaluated over 18354 download images from internet. The attained
precision was 93% in TP and 8% in FP on training dataset, and 87% and 5.5% on
test dataset. Achieved results verify the performance of proposed system versus
other related works
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning
Recently, the attention-enriched encoder-decoder framework has aroused great
interest in image captioning due to its overwhelming progress. Many visual
attention models directly leverage meaningful regions to generate image
descriptions. However, seeking a direct transition from visual space to text is
not enough to generate fine-grained captions. This paper exploits a
feature-compounding approach to bring together high-level semantic concepts and
visual information regarding the contextual environment fully end-to-end. Thus,
we propose a stacked cross-modal feature consolidation (SCFC) attention network
for image captioning in which we simultaneously consolidate cross-modal
features through a novel compounding function in a multi-step reasoning
fashion. Besides, we jointly employ spatial information and context-aware
attributes (CAA) as the principal components in our proposed compounding
function, where our CAA provides a concise context-sensitive semantic
representation. To make better use of consolidated features potential, we
further propose an SCFC-LSTM as the caption generator, which can leverage
discriminative semantic information through the caption generation process. The
experimental results indicate that our proposed SCFC can outperform various
state-of-the-art image captioning benchmarks in terms of popular metrics on the
MSCOCO and Flickr30K datasets
A Cascade Transformer-based Model for 3D Dose Distribution Prediction in Head and Neck Cancer Radiotherapy
Radiation therapy is the primary method used to treat cancer in the clinic.
Its goal is to deliver a precise dose to the planning target volume (PTV) while
protecting the surrounding organs at risk (OARs). However, the traditional
workflow used by dosimetrists to plan the treatment is time-consuming and
subjective, requiring iterative adjustments based on their experience. Deep
learning methods can be used to predict dose distribution maps to address these
limitations. The study proposes a cascade model for organs at risk segmentation
and dose distribution prediction. An encoder-decoder network has been developed
for the segmentation task, in which the encoder consists of transformer blocks,
and the decoder uses multi-scale convolutional blocks. Another cascade
encoder-decoder network has been proposed for dose distribution prediction
using a pyramid architecture. The proposed model has been evaluated using an
in-house head and neck cancer dataset of 96 patients and OpenKBP, a public head
and neck cancer dataset of 340 patients. The segmentation subnet achieved 0.79
and 2.71 for Dice and HD95 scores, respectively. This subnet outperformed the
existing baselines. The dose distribution prediction subnet outperformed the
winner of the OpenKBP2020 competition with 2.77 and 1.79 for dose and DVH
scores, respectively. The predicted dose maps showed good coincidence with
ground truth, with a superiority after linking with the auxiliary segmentation
task. The proposed model outperformed state-of-the-art methods, especially in
regions with low prescribed doses
Automatic Multi-Class Cardiovascular Magnetic Resonance Image Quality Assessment using Unsupervised Domain Adaptation in Spatial and Frequency Domains
Population imaging studies rely upon good quality medical imagery before
downstream image quantification. This study provides an automated approach to
assess image quality from cardiovascular magnetic resonance (CMR) imaging at
scale. We identify four common CMR imaging artefacts, including respiratory
motion, cardiac motion, Gibbs ringing, and aliasing. The model can deal with
images acquired in different views, including two, three, and four-chamber
long-axis and short-axis cine CMR images. Two deep learning-based models in
spatial and frequency domains are proposed. Besides recognising these
artefacts, the proposed models are suitable to the common challenges of not
having access to data labels. An unsupervised domain adaptation method and a
Fourier-based convolutional neural network are proposed to overcome these
challenges. We show that the proposed models reliably allow for CMR image
quality assessment. The accuracies obtained for the spatial model in supervised
and weakly supervised learning are 99.41+0.24 and 96.37+0.66 for the UK Biobank
dataset, respectively. Using unsupervised domain adaptation can somewhat
overcome the challenge of not having access to the data labels. The maximum
achieved domain gap coverage in unsupervised domain adaptation is 16.86%.
Domain adaptation can significantly improve a 5-class classification task and
deal with considerable domain shift without data labels. Increasing the speed
of training and testing can be achieved with the proposed model in the
frequency domain. The frequency-domain model can achieve the same accuracy yet
1.548 times faster than the spatial model. This model can also be used directly
on k-space data, and there is no need for image reconstruction.Comment: 21 pages, 9 figures, 7 table
A Generalised Deep Meta-Learning Model for Automated Quality Control of Cardiovascular Magnetic Resonance Images
Background and Objectives: Cardiovascular magnetic resonance (CMR) imaging is
a powerful modality in functional and anatomical assessment for various
cardiovascular diseases. Sufficient image quality is essential to achieve
proper diagnosis and treatment. A large number of medical images, the variety
of imaging artefacts, and the workload of imaging centres are among the things
that reveal the necessity of automatic image quality assessment (IQA). However,
automated IQA requires access to bulk annotated datasets for training deep
learning (DL) models. Labelling medical images is a tedious, costly and
time-consuming process, which creates a fundamental challenge in proposing
DL-based methods for medical applications. This study aims to present a new
method for CMR IQA when there is limited access to annotated datasets. Methods:
The proposed generalised deep meta-learning model can evaluate the quality by
learning tasks in the prior stage and then fine-tuning the resulting model on a
small labelled dataset of the desired tasks. This model was evaluated on the
data of over 6,000 subjects from the UK Biobank for five defined tasks,
including detecting respiratory motion, cardiac motion, Aliasing and Gibbs
ringing artefacts and images without artefacts. Results: The results of
extensive experiments show the superiority of the proposed model. Besides,
comparing the model's accuracy with the domain adaptation model indicates a
significant difference by using only 64 annotated images related to the desired
tasks. Conclusion: The proposed model can identify unknown artefacts in images
with acceptable accuracy, which makes it suitable for medical applications and
quality assessment of large cohorts.Comment: 16 pages, 1 figure, 2 table
SOMM: A New Service Oriented Middleware for Generic Wireless Multimedia Sensor Networks Based on Code Mobility
Although much research in the area of Wireless Multimedia Sensor Networks (WMSNs) has been done in recent years, the programming of sensor nodes is still time-consuming and tedious. It requires expertise in low-level programming, mainly because of the use of resource constrained hardware and also the low level API provided by current operating systems. The code of the resulting systems has typically no clear separation between application and system logic. This minimizes the possibility of reusing code and often leads to the necessity of major changes when the underlying platform is changed. In this paper, we present a service oriented middleware named SOMM to support application development for WMSNs. The main goal of SOMM is to enable the development of modifiable and scalable WMSN applications. A network which uses the SOMM is capable of providing multiple services to multiple clients at the same time with the specified Quality of Service (QoS). SOMM uses a virtual machine with the ability to support mobile agents. Services in SOMM are provided by mobile agents and SOMM also provides a t space on each node which agents can use to communicate with each other
- …